feat: Add lazy table loading via anndata.experimental.read_lazy #1055
+205
−70
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for lazy loading of tables in SpatialData using anndata's experimental
read_lazy()function.Motivation
Currently, all elements in SpatialData (images, labels, points) are loaded lazily using Dask, except for tables which are always loaded into memory. For large datasets, particularly Mass Spectrometry Imaging (MSI) data where tables can contain millions of pixels with hundreds of thousands of m/z bins, this creates memory bottlenecks.
Changes
lazy: bool = Falseparameter toSpatialData.read()andread_zarr()lazy: bool = Falseparameter to_read_table()in io_table.pyanndata.experimental.read_lazy()whenlazy=True_is_lazy_anndata()helper function to detect lazy AnnData objectsread_lazyUsage
Reproducible Example
This self-contained example demonstrates lazy loading with 99% memory savings:
Expected output:
Requirements
anndata >= 0.12for lazy loading supportReal-world use case
This feature was developed for Thyra, a Mass Spectrometry Imaging converter. MSI datasets can have:
With lazy loading, users can work with these datasets without loading the full table into memory.
Test plan
test_lazy_read_basic- Verify lazy=True creates a SpatialData object without errorstest_lazy_false_loads_normally- Verify lazy=False maintains current behaviortest_read_zarr_lazy_parameter- Verify lazy parameter is passed through correctly